Model Selection

Visual reasoning

# Visual reasoning

Llama 4 Scout 17B 16E Instruct FP8

The Llama 4 series is a native multimodal AI model launched by Meta, supporting text and image interaction. It adopts the Mixture of Experts architecture and performs excellently in text and image understanding.

Multimodal Fusion

Transformers Supports Multiple Languages

Debiased Llama 4 Scout 17B 16E Instruct

Llama 4 Scout is a native multimodal AI model launched by Meta, supporting multilingual text and image understanding. It adopts the Mixture of Experts architecture and has industry-leading performance in text and image understanding.

Transformers Supports Multiple Languages

Idefics2 8b Chatty

Idefics2 is an open multimodal model capable of accepting arbitrary sequences of images and text as input and generating text output. The model can answer questions about images, describe visual content, create stories based on multiple images, or function purely as a language model.

Transformers English

Llava Llama 2 13b Chat Lightning Preview

LLaVA is an open-source multimodal chatbot model based on the Transformer architecture, obtained by fine-tuning LLaMA/Vicuna on multimodal instruction-following data generated by GPT.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase